Identifying Anomalies
Identify anomalies in the given dataset to further understand the process.
In anomaly detection systems, we usually want to identify if we have an anomaly right now, and send an alert.
To identify if the last data point is an anomaly, we start by calculating the mean and standard deviation for each status code in the past hour:
To get the last value in a GROUP BY
and the mean and standard deviation, we used a little array trick.
Next, we calculate the z-score for the last value for each status code:
We calculated the z-score by finding the number of standard deviations between the last value and the mean. To avoid a “division by zero” error, we transform the denominator to NULL.
Looking at the z-scores we got, we can spot that status code 400
received a very high z-score of 6. In the past minute, we returned a 400
status code 24 times, which is significantly higher than the average of 0.73 in the past hour.
Let’s take a look at the raw data:
It does look like in the last couple of minutes, and we are getting more errors than expected.
What our naked eye missed in the chart and the raw data was found by the query and was classified as an anomaly. We are off to a great start!